The main challenge for fine-grained few-shot image classification is to learn feature representations with higher inter-class and lower intra-class variations, with a mere few labelled samples. Conventional few-shot learning methods however cannot be naively adopted for this fine-grained setting -- a quick pilot study reveals that they in fact push for the opposite (i.e., lower inter-class variations and higher intra-class variations). To alleviate this problem, prior works predominately use a support set to reconstruct the query image and then utilize metric learning to determine its category. Upon careful inspection, we further reveal that such unidirectional reconstruction methods only help to increase inter-class variations and are not effective in tackling intra-class variations. In this paper, we for the first time introduce a bi-reconstruction mechanism that can simultaneously accommodate for inter-class and intra-class variations. In addition to using the support set to reconstruct the query set for increasing inter-class variations, we further use the query set to reconstruct the support set for reducing intra-class variations. This design effectively helps the model to explore more subtle and discriminative features which is key for the fine-grained problem in hand. Furthermore, we also construct a self-reconstruction module to work alongside the bi-directional module to make the features even more discriminative. Experimental results on three widely used fine-grained image classification datasets consistently show considerable improvements compared with other methods. Codes are available at: https://github.com/PRIS-CV/Bi-FRN.
translated by 谷歌翻译
广告分配涉及将广告和有机项目分配给有限的饲料插槽,以最大化平台收入,已成为研究热点。请注意,电子商务平台通常有多个针对不同类别的入口,并且某些入口几乎没有访问。这些入口的数据覆盖范围较低,这使得代理很难学习。为了应对这一挑战,我们提出了基于相似性的ADS分配(SHTAA)的混合转移,该转移有效地将样本和知识从数据富裕的入口转移到数据贫乏的入口。具体而言,我们为MDP定义了不确定性感知的相似性,以估计不同入口的MDP的相似性。基于这种相似性,我们设计了一种混合转移方法,包括实例传输和策略传输,以有效地将样本和知识从一个入口传递到另一个入口。 Meituan食品交付平台上的离线和在线实验都表明,该建议的方法可以在数据贫困的入口方面获得更好的性能并增加平台的收入。
translated by 谷歌翻译
随着强化学习(RL)的最新流行率,在推荐平台(例如电子商务和新闻提要网站)中利用RL来利用RL进行广泛的兴趣。为了获得更好的分配,将最近基于RL的广告分配方法的输入从点单项目升级到列表项目的布置。但是,这也导致了国家行动对的高维空间,因此很难以良好的概括能力学习列表表示。这进一步阻碍了RL药物的探索,并导致样本效率差。为了解决这个问题,我们提出了一种基于RL的新方法,用于广告分配,该方法通过利用Meituan食品交付平台上的任务特定信号来学习更好的列表表示形式。具体而言,我们根据对ADS分配的先前领域知识分别提出基于重建,预测和对比度学习的三个不同的辅助任务。我们在Meituan食品输送平台上进行了广泛的实验,以评估拟议的辅助任务的有效性。离线和在线实验结果都表明,与最先进的基线相比,提出的方法可以学习更好的列表表示形式,并获得更高的平台收入。
translated by 谷歌翻译
With the drive to create a decentralized digital economy, Web 3.0 has become a cornerstone of digital transformation, developed on the basis of computing-force networking, distributed data storage, and blockchain. With the rapid realization of quantum devices, Web 3.0 is being developed in parallel with the deployment of quantum cloud computing and quantum Internet. In this regard, quantum computing first disrupts the original cryptographic systems that protect data security while reshaping modern cryptography with the advantages of quantum computing and communication. Therefore, in this paper, we introduce a quantum blockchain-driven Web 3.0 framework that provides information-theoretic security for decentralized data transferring and payment transactions. First, we present the framework of quantum blockchain-driven Web 3.0 with future-proof security during the transmission of data and transaction information. Next, we discuss the potential applications and challenges of implementing quantum blockchain in Web 3.0. Finally, we describe a use case for quantum non-fungible tokens (NFTs) and propose a quantum deep learning-based optimal auction for NFT trading to maximize the achievable revenue for sufficient liquidity in Web 3.0. In this way, the proposed framework can achieve proven security and sustainability for the next-generation decentralized digital society.
translated by 谷歌翻译
The problem of broad practical interest in spatiotemporal data analysis, i.e., discovering interpretable dynamic patterns from spatiotemporal data, is studied in this paper. Towards this end, we develop a time-varying reduced-rank vector autoregression (VAR) model whose coefficient matrices are parameterized by low-rank tensor factorization. Benefiting from the tensor factorization structure, the proposed model can simultaneously achieve model compression and pattern discovery. In particular, the proposed model allows one to characterize nonstationarity and time-varying system behaviors underlying spatiotemporal data. To evaluate the proposed model, extensive experiments are conducted on various spatiotemporal data representing different nonlinear dynamical systems, including fluid dynamics, sea surface temperature, USA surface temperature, and NYC taxi trips. Experimental results demonstrate the effectiveness of modeling spatiotemporal data and characterizing spatial/temporal patterns with the proposed model. In the spatial context, the spatial patterns can be automatically extracted and intuitively characterized by the spatial modes. In the temporal context, the complex time-varying system behaviors can be revealed by the temporal modes in the proposed model. Thus, our model lays an insightful foundation for understanding complex spatiotemporal data in real-world dynamical systems. The adapted datasets and Python implementation are publicly available at https://github.com/xinychen/vars.
translated by 谷歌翻译
很少有图像分类是一个具有挑战性的问题,旨在仅基于少量培训图像来达到人类的识别水平。少数图像分类的一种主要解决方案是深度度量学习。这些方法是,通过将看不见的样本根据距离的距离进行分类,可在强大的深神经网络中学到的嵌入空间中看到的样品,可以避免以少数图像分类的少数训练图像过度拟合,并实现了最新的图像表现。在本文中,我们提供了对深度度量学习方法的最新审查,以进行2018年至2022年的少量图像分类,并根据度量学习的三个阶段将它们分为三组,即学习功能嵌入,学习课堂表示和学习距离措施。通过这种分类法,我们确定了他们面临的不同方法和问题的新颖性。我们通过讨论当前的挑战和未来趋势进行了少量图像分类的讨论。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译
In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.
translated by 谷歌翻译